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ANTI-CONCENTRATION FOR POLYNOMIALS OF INDEPENDENT RANDOM 

VARIABLES 

RAGHU MEKA, OANH NGUYEN, AND VAN VU 


Abstract. We prove anti-concentration results for polynomials of independent random variables with arbi¬ 
trary degree. Our results extend the classical Littlewood-Offord result for linear polynomials, and improve 
several earlier estimates. 

We discuss applications in two different areas. In complexity theory, we prove near optimal lower bounds 
for computing the Parity, addressing a challenge in complexity theory posed by Razborov and Viola, and also 
address a problem concerning OR functions. In random graph theory, we derive a general anti-concentration 
result on the number of copies of a fixed graph in a random graph. 


1. Introduction 

Let ^ be a Rademacher random variable (taking value ±1 with probability 1/2) and A = {oi,... ,a„} be a 
multi-set in M (here n —^ oo). Consider the random sum 


where are iid copies of 

In 1943, Littlewood and Offord, in connection with their studies of random polynomials [SD], raised the 
problem of estimating P(S € I) for arbitrary coefficients a^. They proved the following remarkable theorem: 

Theorem 1.1. There is a constant B such that the following holds for all n. If all coefficients Oi have 
absolute value at least 1, then for any open interval I of length 1, 


P{S&I) < Rn-i/^logn. 


Shortly after the Littlewood-Offord result, Erdos [T2] removed the logn term to obtain the optimal bound 
using an elegant combinatorial proof. Littlewood-Offord type results are commonly referred to as anti¬ 
concentration (or small-ball) inequalities. Anti-concentration results have been developed by many re¬ 
searchers through decades, and have recently found important applications in the theories of random matrices 
and random polynomials; see, for instance, [22] for a survey. 

The goal of this paper is to extend Theorem o to higher degree polynomials. Consider 
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P{xi,...,Xn) := (1) 

Sc{l,...,n};|S|<d jeS 

The first result in this direction, due to Costello, Tao, and the third author, m , is 

Theorem 1.2. There is a constant B such that the following holds for all d, n. If there are coefficients 

as with absolute value at least 1, then for any open interval I of length 1, 


The exponent tends very fast to zero with d, and it is desirable to improve this bound. For the case 

d = 2, Costello [5] obtained the optimal bound In a more recent paper [53], Razborov and Viola 

proved 

Theorem 1.3. There is a constant B such that the following holds for all d, n. If there are pairwise disjoint 
subsets Si,Sr each of size d such that asi have absolute value at least 1 for all i, then for any open 
interval I of length 1, 




This theorem improves the bound in Theorem 1.2 to m via a simple counting argument. 


Researchers in analysis also considered anti-concentration of polynomials, for entirely different reasons. Car- 
bery and Wright [7] consider polynomials with being iid Gaussian and showed 

Theorem 1.4. There is a constant B such that 

Pmi ... ,?n)| < eVar(P(e .. < Be^'^. 


Their result has been extended by Mossel, O’donnell and Oleszkiewicz EH to general variables, at a cost of 
an extra term on the right hand side, which involves the regularity of P (see Section 3). 


The goal of this paper is to further improve these anti-concentration bounds, with several applications 
in complexity theory. Our new results will be nearly optimal in a wide range of parameters. Let [n] = 
{1, 2,..., n}. Following [23], we first introduce a definition 

Definition 1.5. For a degree d multi-linear polynomial of the form the rank of P, denoted by rank(P), 
is the largest integer r such that there exist disjoint sets ^i,..., 5,. C [n] of size d with |as^. | > 1, for j e [rj. 

Our first main result concerns the Rademacher case. Let i = 1,..., n be iid Rademacher random variables. 

Theorem 1.6. There is an absolute constant B such that the following holds for all d,n. Let P be a 
polynomial of the form Q whose rank r >2. Then for any interval I of length 1, 


& I) < min 


Pd^/^Vlogr exp(Pd^(loglogr)^( 


7* 4d+l 










ANTI-CONCENTRATION FOR POLYNOMIALS 


3 


For the case when d is fixed, it has been conjectured [22] that P(P(^i,... S /) = This 

conjectural bound is a natural generalization of Erdos-Littlewood-Offord result and is optimal, as shown by 
taking -P = (^i + • • • + with n even. For this P, the rank r = 0(n) and P(|P| < 1/2) = P(P = 0) = 
Our result confirms this conjecture up to the sub polynomial term exp(Pd^(loglogr)^). 


In applications it is important that we can allow the degree d tends to infinity with n. Our bounds in 
Theorem 1.6 are non-trivial for degrees up to c log r/log log r, for some positive constant c. Up to the log log 
term, this is as good as it gets, as one cannot hope to get any non-trivial bound for polynomials of degree 
log 2 r. For example, the degree d polynomial on 2“ • d variables defined by P(^) = L]i=i +1)> where 

are iid Rademacher random variables, has r = 2'^ and P(P(^) = 0) = U(l). 


Next, we generalize our result for non-Rademacher distributions. As a first step, we consider the p-biased 
distribution on the hypercube. For p S (0,1), let pp denote the Bernoulli variable with p-biased distribution: 
= 0) = 1 — p, Pa;^^p(a; = 1) = p and let Pp be the product distribution on {0,1}". 


Theorem 1.7. There is an absolute constant B such that the following holds. Let P be a polynomial of the 
form Q whose rank r > 2. Let p be such that r := 2'^a'^r > 3 where a := min{p, 1 — p}. Then for any 
interval I of length 1, 


(P(a:) e /) < min 


/Pd'‘/^(logf)^/^ exp(Pd^(loglog(f)^) 
1 (f)l/(4d+l) ’ 


The distribution p” plays an essential role in probabilistic combinatorics. For example, it is the ground 
distribution for the random graphs G{N,p) (with n := (^))- We discuss an application in the theory of 
random graphs in the next section. 

Finally, we present a result that applies to virtually all sets of independent random variables, with a weak 
requirement that these variables do not concentrate on a short interval. 

Theorem 1.8. There is an absolute constant B such that the following holds. Let ^i,... be independent 
(but not necessarily iid) random variables. Let P be a polynomial of the form Q whose rank r >2. Assume 
that there are positive numbers p and e such that for each 1 < i < n, there is a number y^ such that 
min{P(^i < Pi), P(^i > Pi)} = p and Pd^Ci — Pi| >!)>£■ Assume furthermore that r := {pe)‘^r > 3. Then 
for any interval I of length 1 


G -f) < min 


/Rd4/3(logf)i/2 exp(Rd2(loglog(f)2) 
(f)i/(4d+i) ’ 


Notice that even in the gaussian case. Theorem |1.8| is incomparable to Theorem |1.4[ If we use Theorem |1.4| 
to bound P(P S I) for an interval / of length 1, then we need to set e = Var(P)“^/^, and the resulting 
bound becomes For sparse polynomials, it is typical that r is much larger than (VarP)^/'^ and 

in this case our bound is superior. To illustrate this point, let us fix a constant d > c > 0 and consider 


P := ^ 

SC{l....,r!,},|S|=d iGS 

where as are iid random Bernoulli variables with P(as = 1) = n~‘^. It is easy to show that the following 
holds with probability 1 — o(l) 
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• For any set X C {1,, n} of size at least n/2, there is a subset S C X, [S'] = d, such that as = 1. 

• The number nonzero coefficients is at most 

In other words, these two conditions are typical for a sparse polynomial with roughly nonzero coefh- 
cients. On the other hand, if the above two conditions holds, then we have Var(P) < n‘^~^ and r > n/2d 
(by a trivial greedy algorithm). Our bound implies that 


P(P G /) < 

while Cabery-Wright bound only gives 


P(P G /) < 

The rest of the paper is organized as follows. In Section below, we discuss applications in complexity 
theory and graph theory, with one long proof delayed to Section Sections and are devoted to some 
combinatorial lemmas. In Section]^ we treat polynomials with Rademacher variables. The generalizations 
are discussed in Section All asymptotic notations are used under the assumption that n tends to infinity. 
All the constants are absolute, unless otherwise noted. 


2. Applications 

2.1. Applications in complexity theory. We use our anti-concentration results to prove lower bounds 
for approximating Boolean functions by polynomials in the Hamming metric. The notion of approximation 
we consider is as follows. 

Definition 2.1. Let e > 0 and /r be a distribution on {0,1}". For a Boolean function / : {0,1}" —>■ {0,1} 
and a polynomial P : M" —>■ K, we say P e-approximates / with respect to /rj^if 

Pxr^tiiPix) = f{x)) > 1 - e. 

We define to be the least d such that there is a degree d polynomial which e-approximates / with 

respect to /i. 


An alternate (dual) way to view the above notion is in terms of distributions over low-degree polynomials— 
“randomized polynomials”—which approximate the function in the worst-case. In particular, by Yao’s 
min-max principle, d^^e{f) < d for every distribution gL if and only if there exists a distribution T) over 
degree at most d polynomials which approximates / in the worst-case: for all x, Pp~i5[^’(a:) = /(x)] > 1-e. 

Approximating Boolean functions by polynomials in the Hamming metric was first considered in the works of 
Razborov [21] and Smolensky [2S] over fields of finite characteristic as a technique for proving lower bounds 
for small-depth circuits. This was also studied in a similar context over real numbers by the works of m, m-, 
the latter work uses them to prove lower bounds for AC'(O). More recently, in a remarkable result, Williams 
P7] (also see EHHH) used polynomial approximations in Hamming metric for obtaining the best known 
algorithms for all-pairs shortest path and other related algorithmic questions. Here, we study lower bounds 
for the existence of such approximations. 

^We drop /i in the description when it is clear from context or if it is the uniform distribution. 
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Approximating Parity. Let par„ : {0,1}" —>■ {0,1} denote the parity function: par„(a;) = Xi0a;2©- • -©Xn 
(where arithmetic is mod 2). 

In [23] , Razborov and Viola introduced another way to look at this problem. For two functions /, g : 
{0,1}" —)■ M, define their ’’correlation” to be the quantity 

CoTnif.g) = 'Px{f{x) = g{x)) - 1/2, 

where x is uniformly distributed over {0,1}". They highlighted the following challenge 

Challenge. Exhibit an explicit boolean function / : {0,1}" —)■ {0,1} such that for any real polynomial P 
of degree log 2 n, one has 

Cor„(/,P) < o(l/v^). 

This challenge is motivated by studies in complexity theory and has connections to many other problems, 
such as the famous rigidity problem; see [53] for more discussion. 

The Parity function seems to be a natural candidate in problems like this. Razborov and Viola, using 
Theorem 1 1.3[ proved 

Theorem 2.2. |23j For all sufficiently large n, Cor„(par„,P) < 0 for any real polynomial P of degree at 
most i log 2 log 2 n. 


With Theorem 1.6 we obtain the following 
factor. 


improvement, which gets us within the Challenge by a log log n 


Theorem 2.3. For all sufficiently large n, Cor„(par„, P) < 0 for any real polynomial P of degree at most 

log n 

15 log log n ’ 


Proof. Let d be the degree of P. Following the arguments in the proof of |23l Theorem 1.1], we can 
assume that P contains at least ffn pairwise disjoint subsets Si each of size d and non-zero coefficients. It 
suffices to show that the probability that P outputs a boolean value is at most 1/2. By replacing P by 
q{xi,... ,Xn) ■= Pi{xi + l)/2,..., {xn + l)/2), one can convert the problem into polynomial of the same 
degree defined on {±1}"', in other words, on Rademacher variables. Then by Theorem 1.6, this probability 
is bounded by This is less than 1/2 for every d < when n is sufficiently large. □ 


Approximating AND/OR. One of the main building blocks in obtaining polynomial approximations in 
the Hamming metric is the following result for approximating the OR functiorj^ 

Claim 2.4. For all e € (0,1) and distributions p, over {0, 1}", there exists a polynomial P : K” —>■ M o/ 
degree at most 0((logn)(log 1/e)) such that Pxr..ii{P{x) = OR{x)) > 1 — e. 

By iteratively applying the above claim, Aspnes, Beigel, Furst, and Rudich |2] showed that AC'(O) circuits 
of depth d have e-approximating polynomials of degree at most 0(((log s)(log(l/e)))'^ • (log(s/e))‘*“^). We 
prove that the following lower bound for such approximations: 


‘^OR(xi,... ,x„) is 1 if any of the bits Xi is non-zero. 
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Theorem 2.5. There is a constant c > 0 and a distribution ^ on {0,1}" such that for any polynomial 
P : {0,1}" — >• K o/ degree d < c(log log n)/ (log log log n), 

P.r.^iPix) = OR{x)) < 2/3. 


To the best of our knowledge no a;(l) lower bound was known for approximating the OR function. We give 
an explicit distribution (directly motivated by the upper bound construction in 0 ) under which OR has no 
1/3-error polynomial approximation. The distribution p on {0,1}" we consider is as follows: 

(1) With probability 1/2 output a: = 0. 

(2) With probability 1/2 pick an index i € [D] uniformly at random and output x ^ some 

suitably chosen parameters a^D. 


The analysis then proceeds at a high level as in the lower bound for parity. However, we need some extra care 
with the inductive argument as unlike for parity, we can’t consider arbitrary fixings of subsets of coordinates 
of the OR function. We get around this hurdle by instead only considering fixing parts of the input to 0 
and decreasing the bias p to make sure that these coordinates are indeed set to 0 with high probability. The 
details are defered to Section 7. 


2.2. The number of small subgraphs in a random graph. Consider the Erdds-Renyi random graph 
G{N,p). Let H he a small fixed graph (a triangle or C 4 , say). The problem of counting the number of 
copies of H in G{N,p) is a fundamental topics in the theory of random graphs (see, for instance, the text 
books [illlS]). In fact, one can talk about a more general problem of counting the number of copies of H in 
a random subgraph of any deterministic graph G on N vertices, formed by choosing each edges of G with 
probability p. We denote the F{H,G,p) this random variable. In this setting we understand that H has 
constant size, and the size of G tends to infinity. 

It has been noticed that F can be written as a polynomial in term of the edge-indicator random variables. 
For example, the number of C 4 (circle of length 4) is 


where the summation is over all quadruple ijkl which forms a (74 in G and the Bernoulli random variable fij 
represents the edge ij. Clearly, any polynomial of this type has n = e(G) iid Bernoulli p-bias variables ^ij, 
and its degree equals the number of edges of FI. The rank r of F is exactly the size of the largest collection 
of edge disjoint copies of H in G. 

The polynomial representation has been useful in proving concentration (i.e. large deviation ) results for F (see 
[BUS], for instance). Interestingly, it has turned out that one can also use this to derive anti-concentration 
result, in particular bounds on the probability that the random graph has exactly m copies of H. 

By Theorem 1 1.7[ we have 

Corollary 2.6. Assume that p is a constant in (0,1). Then for fixed H and any integer m which may 
depend on G 
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'P{F{H,G,p) = m)< 7 .-i/ 2 +o(i)^ 

where r is the size of the largest colleetion of edge-disjoint copies of H in G. In particular, if G = K^, then 


P{F{H,Kn,p) =m)< 


A similar argument can be used to deal with the number of induced copies of H, which can be also written 
as a polynomial with degree at most ( 2 ), with v being the number of vertices of H. Details are left out as 
an exercise. 

Finally, let us mention that in a recent paper [13] . Gilmer and Kopparty obtained a precise estimate for 
P{F{H, Kn,p) = m) in the case when iJ is a triangle. Their approach relies on a careful treatment of the 
characteristic function. It remains to be seen if this method applies to our more general setting. 


3. Regular polynomials 

Our proofs of anti-concentration bounds use the techniques developed in the context of bounding the noise 
sensitivity of polynomial threshold functions in the works nni HU EH]. In particular, we use the concept of 
regular polynomials, the invariance principle of Mossel, O’donnell, and Oleszkiewicz HU, and the regularity 
lemma of [101 [H]. In this and the following section, we discuss these tools. 

To start, we define regular polynomials and discuss an anti-concentration result for them. The influence of 
the i-th variable on P is defined to be Inb = Infi(P) = J2ieS Since Var(P) = have 

n 

Var(P) < Inf, < dVar(P). (2) 

i=l 

Assume the random variables are ordered such that Infi > Inf 2 > • • • > Inf„. Let r > 0, the T-critical index 
of P is the least i such that Inb+i < t Inf^. If it does not hold for any i, we say that the P has 

r-critical index 00 . If P has r-critical index 0, we say that P is r-regular. The following is a corollary of 
strong results from |7] and m- 

Proposition 3.1. Let P be a non-constant polynomial of the /orm[7] Let t > 0. If P is r-regular, then 
<a) < (vf 4 p))U 2 d + for every a > 0. 


Proof. Let ^ 1 ,... ,^„ be independent standard Gaussian variables. Notice that 

Var(P(ei ... ,e„)) = Var(P(|i,... 


Our settings satisfy the Hypothesis H4 of m Theorem 3.19] with r = 4. Using that theorem, one obtains 


P(|P(6,---,en)| <a) < P(|P(|i,...,|„)| <a) + Cdri/(4d+i). 


(3) 


3 
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Now, for Gaussian case, it was proved in [71 Theorem 8] that for every a > 0, 


P(|P(Ci,...,Cn)l<a)<C' 


(Var(P))i/2d- 


(4) 


Combining ^ and Q, we get the desired bound. 


□ 


4. A REGULARIZATION LEMMA 


Proposition 

there is no guarantee for this assumption. In order to go from the regular case to the general case, we will 
use the following regularization lemma, whose proof is a slight modification of uni Theorem 1.1] (the version 
below gives us better quantitative bounds in our applications). The main idea is to condition on the random 
variables with large influence. With high probability, the resulting polynomial is either regular or dominated 
by its constant part. 

For a set S' C [n], we consider a random assignment p G {±1}I'®I which assigns values ±1 to variables 
We say that “p fixes S”. For each such p, the polynomial P becomes a polynomial of which is denoted 

by Pp. We write Pp = P*{p) + qp{^i)i^s where P* is the constant part of Pp consisting of monomials of 
(^i)zGS only. For C > 0 and 0 < /? < 1, we say that Pp is (C, l3)-tight if 

< \P*{P)\ 

and 

P(?A.s(kpl<^l^*(p)l)>l-/3- (6) 

Note that it is always true that E(^.).^ggp = 0. We shall see later that (§ actually implies Ill- 

Proposition 4.1. There exist absolute constants C and C' such that the following holds true. Let P($i, ..., f,n) 
be a a degree-d polynomial, let 0 < t,[3 < 4. Let a = (^(dloglog 1//3 + dlogd) and t' = (C'dlogdlog 4)'^r. 
Let M G N such that < n. Then, there exists a decision tree of depth at most with P at the root, 
variables ’s at each internal node, and a degree-d polynomial Pp at each leaf p, with the following property: 
with probability at least 1 — (1 — 2 ^)^, a random path from the root P reaches a leaf p such that Pp is either 
t'- regular or {C, l3)-tight. 


3.1 would yield our desired bound in Theorem 1.6 if r is small (say at most r ^). However, 


Proof. First, we consider the case when the r-critical index of P is large. For a positive integer K, denote 
by [K] the set {1,..., K}. 

Lemma 4.2. There exists a constant C such that the following holds true. Let 0 < t, /3 < 4 ftg deter¬ 
ministic constants that may depend on n. Suppose that P has r-critical index at least K = ^, where 
a = (^(d log log 1//3 + dlogd). Then for at least fraction of restrictions p fixing [K], the polynomial Pp 
is (C, f3)-tight. 


Roughly speaking, the (C, ,d)-tightness asserts that the resulting polynomial Pp has large constant term, 
compared to the random part, and therefore, it concentrates around the constant part. 
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Proof. Since the proof is completely the same as the proof of [101 Lemma 3.5], we only provide a sketch here. 
Without loss of generality, assume that Var(P) = 1. We first show that 

P,(l^’•(rtl > 5^) > ^ (7) 

where by Pp we mean the probability with respect to ^i,..., fx- Observe that Varp(P*(p)) = X) 05 ^sc[A'] — 

Var(P) = 1. Moreover, by definition of critical index, 

" 1 

E Inb(P) < (l-r)^^Inb(P) < de-“ < -. (8) 

i^[K] i=l 

Hence, 1 > Varp(P*(p)) = Var(P) - a| > 1 - J2i^[K] Infi(P) > 5 . Then, we use the following 

Theorem 


Theorem 4.3. [TTj . also uni Theorem 2.5]) There is a universal constant Cq > 1 such that for any 
non-zero degree-d polynomial P : { — 1,1}" —>■ ffi with E(P) = 0, we have 


P 



VVa^\ 

) 


> 


Co"' 


Let C > Cq. Applying the above Theorem to P*{p) — EpP*(/9) if EpP*(p) > 0 and —P*{p) + EpP*(p) 
otherwise gives Q. 

Next, we show that 

P,(^Var(,,)>^(]c(logl)]) (9) 

Indeed, let Q[p) = Var(gp). By triangle inequality and Bonami-Beckner inequality (see, for instance, [TUI 
Theorem 2.1], or [U], [H]), one can show that \\Q{p )\\2 = ^y'EpQ'^ip) < 3"^ EpInfi(Pp) = 

3‘‘de~°‘ where the last inequality is just Q. From this, we use the following Theorem 

Theorem 4.4. ([3|, [TT] , also [B Theorem 2.2]) Let P : {—1,1}" —>■ M &e a degree-d polynomial. For any 
t > e'^, we have 

P{]P]>t]]P]]2)<exp{-nif/‘^)). 


Using this Theorem for the polynomial Q and t = d‘^C‘^ log*^ C, we get ([^. 

From Q and (|^, with probability at least over all possible p, ([^ happens. For each such p, using 
Theorem |4.4| for g, we obtain 

P«.+„....«.(kp|>^|P*(p)|)<P€.+,.....«„ (^kp|>^Clog^) ' llgpll2^ </3, 
which gives § and completes the proof of Lemma |4.2[ □ 


Next, we consider the case when P has small critical index. We’ll use the following Lemma [IHl Lemma 
3.9] which asserts that by assigning values to the random variables with large influences, with significant 
probability, one gets a regular polynomial. 
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Lemma 4.5. Let C he the constant in Lemma ^.2. There exists an absolute constant C' such that the 
following holds. Let 0 < t < Assume that P has r-critical index k G [n]. Let p be a random restriction 
fixing [A:], and t' = (C'dlogdlog With probability at least 2 ^ over the choice of p, the restricted 

polynomial Pp is t' -regular. 


Combining Lemmas |4.2| and [45l we get 

Lemma 4.6. Let P{fi,... ,fn) be a a degree-d polynomial, 0 < t, P < ^. Let a = C{d\og\ogl/Pd\ogd) 
and t' = (C'dlogdlog Assume that Infi > /n /2 • • • > Then one of the following holds true. 

(1) P is T-regular. 

(2) The T-critical index of P is at least “ and the conclusion of Le mma\ 4-.^ holds. 

(3) The T-critical index of P is k < ^ and the conclusion of LemmaU.S] holds. 


Now, we are ready for the proof of Proposition 4.1 
At first, if P is not r-regular, we apply Lemma 


4.6 


The strategy is to apply Lemma [4~6| repeatedly M times, 
to P and obtain an initial tree of depth at most -. We 


know that at least 2 ^ fractions of the restricted Pp are ’’good”, i.e., either r'-regular or (C,/3)-tight. We 
keep them as leaves of our final tree and leave them untouched during the next stages. At the second stage, 
for each of the remaining ’’bad” polynomials Pp, we order the unrestricted variables in decreasing order of 
their influences in Pp, and then apply lemma 4.6 to it. Note that probability of reaching a bad leaf in this 
second tree is at most (1 — 2 ^)^- Continuing in this manner M times, we get the desired tree and complete 
the proof of Theorem |4.1| □ 


5. Proof of Theorem 11.61 


The high-level argument for the first bound of |1.6| is as follows. If the polynomial is sufficiently regular, we 
apply the anti-concentration property of regular polynomials; the latter property in turn follows from the 
invariance principle and a similar anti-concentration property for polynomials with respect to the Gaussian 
distribution. 


To complete the argument, we use the regularity lemma which shows that any polynomial can be written as 
a small-depth decision tree where most leaves are labeled by polynomials which are either (1) Regular or (2) 
Polynomials which are fixed in sign with high probability over a uniformly random input. In the first case, 
you get a regular polynomial of high rank (as the tree is shallow) and we apply the previous argument. In 
the second case, we argue directly that the probability of taking the value 0 is small. 


To prove the second bound of 1.6 we follow the same conceptual approach but adopt a more careful analysis 
following the work of Kane m- We defer the details to the actual proof. 


5.1. First bound. Without loss of generality, we can assume that L is centered at 0 and r is larger than 
some constant. We can also assume that d < . ^ ” because otherwise > 1 and the desired 

— log log r — 

bound becomes trivial. 


Let T S (0, |) and let /3 = A We will use Proposition 4.1 to reduce to the regular case. Let a, t' be as in 
that Proposition, i.e., a = (^(dloglog j -I- dlogd) and r' = (C'dlogdlog ^Yt. Let M = . Call a leaf of 
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the decision tree good if Pp is either r-regular or (C, /3)-tight and bad otherwise. Now, following our decision 
tree, we have 


P(P G I) 


< P (reaching a bad leaf) + ^ P (reaching p and Pp G I) 

p is a good leaf 

< (1 - + XI P(reaching p and Pp G I) 

p is a good leaf 

< 2exp ^ P(reaching p and Pp G/). 

p is a good leaf 


( 10 ) 


Now, for each good leaf p, Pp is either (C, /3)-tight or r'-regular. Let S be the set of indices i of the internal 
nodes that lead to p. In other words, p fixes S. Since the depth of the decision tree is at most M“ < one 
has l^l < I and so qp contains at least r 12 monomials of degree d each, with mutually disjoint sets of random 
variables, and with coefficients at least 1 in magnitude. Therefore, (Pp) = (gp) > r/2. 


Assume Pp is (C,/3)-tight, then by ([^, one has |P*(p)| = f^(v^) > 2. This together with ([^ give 
P(reaching p and Pp G J) = P 5 ,,igs(reaching p)P^^p^s(Pp G I) 

< Pe^.iGS(reaching p^P^^p^silqpl > \P*{p)\ - 1 > ^\P*{p)\) 

< /3P{,pes(reaching p) = (reaching p). 


Next, assume that Pp is r'-regular. 
P (reaching p and Pp G I) 


By Proposition 3.1 


= Pj.pgs (reaching 
< Pj.pgs (reaching 


< Pj^pgs (reaching 


p)PiiMsiPp G 



( 11 ) 


( 12 ) 


Since the events that the root P reaches different leaves on the tree are 
we get that for any 0 < r < 1, 


disjoint, from ( [Io| , and ( [I^ , 


P(P G/) < 2 exp 


{d log log r + d log d) 


Cd 

pll2d 





1/4 ^ 

+ -. (13) 


because we assumed that d < 

— log log r 


Set T = »C^+P°&pdlos\osr+dlosd) ^ 

r 

right of (13) becomes 2r“^ and the third term is bounded from above by B '^ 'i/'^f/i) 
proof of the first bound. 


The first term on the 
-. This completes the 


5.2. Second bound. We next build on the arguments in the previous section to prove the second bound in 
Theorem 11.61 

The main ingredient in proving the second bound is the following technical lemma of |18j which says that a 
random restriction of a sufficiently regular polynomial will likely have a much larger expectation compared 
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to its standard-deviation. This is useful because polynomials with large expectation relative to standard- 
deviation have small probability of vanishing by tail bounds such as Theorem |4.4[ In case the tail bound 
does not give a sufficiently good bound, we recurse on the new restricted polynomial. To state the lemma we 
need the following definition: For 7 > 0, call a polynomial P : K" —>■ K "/-spread if Var(P(^i,..., > 

Proposition 5.1. Let b, n be such that b\n. Let P : K" —>■ K fee a non-constant r-regular degree d polynomial. 
Let Si,. ■ ■ ,Sb be a partition of [n] into equal-sized blocks. For £ G [fe], and an assignment S {1, —IjWVS'f 
to the variables not in Si, let P^i : —)■ K denote the polynomial obtained by fixing the variables not in Si 

to Then, 

b 

Pj! (Pji is "/-spread) < 2^^^^ ■ ( 7 ^ -|- 1 ) • (^Vb , 

1=1 

where for clarity, the assignments for different I are independent. 

In particular, there exists an index I G [fe], such that 

P^i{P^i is "/-spread) < • ( 7 ^ -|- 1) • (^/Vb. 


For the proof, we need the following definitions from m 


• For a function / : K” —M and a vector v G ffi", Dyf{x) = v ■ V/(x). 

• Let f = (Cl, ■ • ■, Cn) and C = (Ci, • ■ • > ?n) be independent collections of Rademacher random variables. 
For a polynomial P : M" —>■ K, define 


a{P) = 


^min 



\Dcpm^ \\ 

mw ))' 


The following claims are implicit in m- 

Lemma 5.2. For any polynomial P : K" —)■ K, Var(P) < 2‘^('^)(E(P)^ -|- Var(P)) • a{P). 


Proof. The claim is proved in m Lemma 21]. □ 

Lemma 5.3. Let h,n be such that b\n. Let P : K" —> K fee o non-constant r-regular degree d polynomial. 
Let Si,... ,Sb be a partition of [n] into equal-sized blocks. For £ G [fe], and an assignment C* G {1, —IjWVS'f 
to the variables not in Si, let P^i : -G K denote the polynomial obtained by fixing the variables not in Si 

to C*. Then, 

b 

'^E,^i{a{P^^)) = 0{d^a{P)Vb + d'^brP^^'^')), (14) 

e=i 

where for clarity, the assignments for different I are independent. 


Proof. Notice that the right-hand side of (14) doesn’t change if the assignments C* are obtained by choosing 
n random variables Ci, ■ • •, Cn and then looking at the fe different restrictions C* ■ Tbe lemma is then proved 
in [m Proposition 19] (essentially Equation (4)). □ 


Combining the above two claims gives us the proposition. 
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Proof of Proposition \5.1\ For any index i G [&], we have 
P{P^i is 7 -spread) = P{j'^Var(P^t) > 

Yar{P^e) 


= P 


> 


p{p^,r+Yar{p^,) “ 72 -h r 

< P(a(P^f > 1 /( 7 ^ -I- 1)) (by Lemma 5.2 applied to P^t) 

< • (72 -I- 1) • E(a(P^f)) (by Markov’s inequality). 


Therefore, by Lemma 5.3 

b 


is 7-spread) < • ( 7 ^ -H 1) • E(a(PjO) 


e^i 


e^i 


= 20 id) . (^2 . o{d^a{P)Vb + 

^ 20id) . (^2 ^ p 6 t 1/®'^). 

The claim now follows as a{P) < 1 by definition. 


□ 


We are now ready to prove the second bound of Theorem |1.6| Similar to the proof of the first bound, without 
loss of generality, we can assume that / = [—1,1], r is sufficiently large, and that d < Let, 

f{r,d) = max{P(P(^) G I) : P degree d polynomial with rank(P) > r}. (15) 


Let P be a degree d multi-linear polynomial with rank(P) = r achieving the minimum f{r,d). For fixed 
parameters r G (0,1/3) and 7 > 2 to be chosen later, let ^ and let T be a decision tree as guaranteed 
by Proposition 4.1 with M = where a and t' are as in that Proposition. Then the depth of the tree is 


at most 5 1 and as in the proof of the first bound. 


P(P (0 e L) < 2 exp (- 


U- + P[Pp(Oe/| 


ACd-aJ r 


Pp is r'-regular]. 


(16) 


Now, consider a leaf p so that Q = Pp is r'-regular. Note that rank((3) > r/2 and in particular Q is 
non-constant. Fix b < r/4, a parameter to be chosen later. Fix a partition Si,...,S}, of the variables 
of Q such that for £ G [ 6 ], the restricted polynomials obtained by fixing the variables not in Sg each 
satisfy rank((5^) > [rank((5)/5J (this can be done for instance by first partitioning the variables witnessing 
rank((5)). Note that if the number of variables in Q is not divisible by &, we only need to add a few variables 
to Q without affecting its output nor its regularity. Now, by Proposition |5.1| applied to the polynomial Q, 
there exists £ G [5] such that the polynomial obtained by a random assignment to the variables not in 
is 7 -spread with probability at most 

2 OW . (y + 1) . (l/Vb + r'^/^d^ . 


Therefore, 

P(Q(y) e I) 


< • (72 -p 1 ) • (l/Vb + • P(Q^( 2 ) G I\ is 7 -spread) -f 

P{Q^{z) G I\ is not 7 -spread) 

< • (72-p 1) • (l/v^-hr'i/®'^) •/([rank(Q)/&J,d)-f P(Q'^(0) G/| is not 7-spread). 
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Finally, to bound the last term, observe that if is not 7 -spread and not identically zero, then 

P(Q^(z)g/) = P(|Q'| < 1) <P(|Q^(z)-E(Q0| > |E(gOl-i) 

^ 7Var(Q^)i/2 


< P (^|Q"(z)-E(Q^)| > 

< 2exp (by Theorem [F4|) , 

where in the next to last inequality, we use the inequalities |E(g^)| > 7 .Var(g^)^/^ > 7 .rank(g^)^/^ > 
7 .(r/ 2 &)i /2 > 2 and so |E(Q^)| - 1 > 

Combining the above arguments, we get that if 6 < r/4, 

P{Q{x) G /) < • (7^ + 1) • (iM + •/(Lr/6J,d) + 0(l)exp (-^( 1 ) 71 / 2 '^) . 


Hence, by (16) we have that 

P(P(^) G /) < 2 exp (-^) + ^ + • ( 7 ' + 1) • (l/v^ + r'i/ 8 ^) • f{[r/b\ ,d) + 0{l) exp (- 


(17) 


Now, as in the proof of the first bound of Theorem 1.6 set r = — - log r{d log log r+d log d) ^ ^ 

and 7 = (Clogr)^^/^. Then, 

f{r,d) < {Clogr)f^ ■ /(r^-i/'^^d) • 

(here we used the fact that f{r,d) > n(r“^/^) by choosing the polynomial p{^i,... ,^rd) = ^ 1^2 + 

. .. C 2 d + • ■ ■ + ^rd-d+i ■ ■ ■ ird, and SO all the other terms on the right-high side of ( [l7| ) are dominated by 
the term (C'logr))‘^‘^ • f(r^~^^‘^'^,d) • 7 --i/ 8 d ) 

k 

Let a = 1 — l/Ad. Applying this recurrence relation k times with r‘^ = C (so k = 0(dloglogr)), we get 


^k-l 


Cd 


f{r,d) < (Clogr))'=^'' maM •/(r“', d) • 


7/8d 


G=0 


< g^O{d^ (loglogrf) ^-{l-a^)/2 _ ^gO(d^ (log log r)^)^-l/2 ^ 

completing the proof of the second bound and hence Theorem |1.6| 


6. General distributions 


6.1. Proof of Theorem |1.7[ We reduce the p-biased case to the uniform distribution at the expense of a 


loss in the rank of the polynomial and then apply Theorem 1.6 


First notice that if x ~ /ip, then 1 — x ~ Pi-p. And so, by replacing the polynomial P by g(xi,..., x„) = 
P(1 — Xi,. .., 1 — x„), we can exchange the roles of p and 1 — p. Therefore, without loss of generality, we 
assume that a = p < 1/2. 


Our assumption 2‘^p‘^r > 3 guarantees th at lo glog(2‘^p‘^r) = 0(1) and hence by choosing the implicit con¬ 
stants on the right-hand side of Theorem 


1.7 


to be sufficiently large, we can assume that 


2dpdp jg greater 


than 100 (say). 
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Let rji,... ,r]n and be independent Bernoulli random variables with P(77i = 0) = 1/2 and = 

0) = 1 — 2p. Let then ..., are iid Bernoulli variables with P(^j = 0) = 1 — p. Therefore, we 

need to bound P(P(^i,..., G I). 

From the definition of rank(P), there exist disjoint sets Si,... ,Sr such that |as^-1 > 1 for all j = 1,..., r. We 
have P{^i ,..., ^n) = X]scH,|S|<d (as riiGS^O riiG gTii- Conditioning on the ^'’s, P becomes a polynomial 
of degree d in terms of rji whose coefficients associated with Sj are bs^ ■= as^ IliGSj accordingly. For each 
such j, one has 

> 1) = P(^' = 1,V* G S,) = {2p)‘^. 


Now, since the sets Sj are disjoint, the events | 6 s^-| > 1 are independent. Define X = l>i- 

By the classical Chernoff’s bound we have, for 0 < 7 < 1, P(|X — EX| > 7 EX) < 2e“''' Thus, 

we conclude that with probability at least 1 — exp(— 2 ‘^“^p'^r/ 6 ), there are at least 2 ^~^p'^r indices j with 
\bj\ > 1. Conditioning on this event, we obtain a polynomial of degree d in terms of rji,... ,rin which has 
rank at least 2‘^~^p‘^r. The theorem now follows from applying Theorem 1.6 to this pol yno mial and noting 
that the additional error of exp(—2‘^“^p‘^r/6) is smaller than both terms from Theorem 


1.6 


6.2. Proof of Theorem 1.8, By replacing P{xi,... ,Xn) by Q{xi, 


Xn) = P{xi +yi,...,Xn + yn) and 
by ^i — yi, we can also assume without loss of generality that yi = 0 for all i. Furthermore, we can assume 
that P(^i < 0) = p for all i. Indeed, if for some i, P(^i > 0) = p, we replace by and modify the 
polynomial P accordingly to reduce to the case P(Ci < 0) = p. And then the proof runs along the same lines 
as in the case P(^i < 0 ) = 0 . 


For each i = l,...,n, let and be independent random variables satisfying P(^/’ G A) = P(^i G 
> 0) P(^~ G A) = P(^i G < 0) for all measurable subset A C K. Let 771,..., ?7„ be iid 

random Bernoulli variables (independent of all previous random variables) such that P(77i = 0) = p. Let 
i'i = + then 5 ,' and ^ have the same distribution. Therefore, it suffices to bound the probability 

that P{^'i, ... belongs to I. One has 

piCi. ■■■,0 = - ?r)+■ ■ •, - 0+0 = E (- O) Ild^ + Q, 

SC[n],|S|=d V ieS / ieS 

where Q is some polynomial which has degree < d in terms of pi when all the are fixed. From the 
definition of rank(P), let Si,... ,Sr be disjoint subsets of [n] with jas^ | > 1 for all 1 < / < r. Conditioning 
on the variables , the polynomial P becomes a polynomial of degree d in terms of pi whose coefficients 
associated with Sj are bs^ := — ) accordingly. For each such j, one has 

P^±....,^±(|&S,| > 1) > P(e - c > G S,). 


Since > 0 > a.e., one has 2P(^+ - > 1) > P(|C/ > 1) + P(IC* < -1) = P(ICi| > 1) > e. Hence, 

Pj±..,„5±(|6s,|>1)>2-V. 


Now, since the sets Sj are disjoint, the events | > 1 are independent. Therefore, using a Chernoff-type 
bound as in the proof of Theorem 1.7 one can conclude that with probability at least 1 — exp(—2“'^e'^r/12), 


there are at least r2 ‘*e“/2 indices j with \bj\ > 1. Conditioning on this event, we obtain a polynomial of 
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degree d in terms of 771 ,..., 77 „ which has rank at least r2 
bound. 




/2. Using Theorem 1.7 one obtains the desired 


7. Proof of Theorem 12.51 

Let a be an integer to be chosen later. Let D = [logjj(log 2 n— 1)J be the largest integer such that 2““^ > 2/n. 

Let fj, be the distribution obtained by the following procedure: 

(1) With probability 1/2 output x = 0 (the all O’s vector). 

(2) With probability 1/2 pick an index i € {1,..., -D} uniformly at random and output x ~ ■ 

We next show that for some constant c > 0, there exists no polynomial P of degree d < c(log log n)/(log log log n)) 
such that Pxr~.fj,{P{x) = OR{x)) > 2/3. Let P be such a polynomial. Then, necessarily, P(0) = 0; as 
Px~fj,{P{x) = 0) < 1/2 + 1/2(1 — 2““^)" < 1/2 + (1/2)(1 — 2/n)" < 2/3, there must exist a set of indices 
I C \p] with |/| > Vl{D) such that for all i S /, 

(P(a:) = 1) = U(l). 


Let I = {ii < 12 <■■■< ik} and for £ e [k], let pe = 2 . 

P — 1 and X ~ /r”^, we get that either rank(P) < {i/2piY 


Now, by Theorem 
or 


1.7 


applied to the polynomial 


n{l)=P{P{x) = 1) < 0{dP^) 


log(rank(P)(2pi)‘*)^/^ 
(rank(P) (2pi)‘^y/ (4d+i) 


Hence, in any case, rank(P) < ri = /pf. This in turn implies that there exists a set of ri • d indices 

Si C [n] such that the polynomial Pi = Pg^ obtained by assigning the variables in Si to 0 is of degree at 
most d — 1. Further, for x ^ 


U(l) = P,(P(x) = 1) = P(xs, = 0) • P,(P(a:) = l|xs, = 0) + P(xs, ^ 0) • P,(P(a:) = l|a:s, ^ 0) 

- = 1) + P(a;si ^ 0) 

- ■-P2- 

H^p 2 

Thus, 

P HMSil (Pl(x) = 1) > U(l) - + l • {p2/pi) = f^(l) - dO(d) + l2-a’2+da‘i > _ rfO(d)2-a‘i ^ 

X'^ f-Lp2 

for a > 2d. Further, note that Pi(0) = 0. 


Iterating the argument with Pi and so forth, we get a sequence of polynomials Pi, P 2 ,..., Pk-i such that 
for 1 < j < min(d, k — 1), Pj is of degree at most d — j, Pj(0) = 0 and for x ~ 

P^(P,-(a:) = 1) = U(l) - dOU)+i2-“. 

This clearly leads to a contradiction if fc > d and a > Cdlogd for a large enough constant C (so that the 
right hand side of the above equation is non-zero for j = d). 

Therefore, setting a = Cdlogd, for a sufficiently big constant C, we must have k = U(P) < d. That is, 
log 2 (n — 1) = = d^^^\ Thus, we must have d = r2(l)(loglogn)/(logloglogn). 
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